Column 1: x coordinate fish 1 (pixel)
Column 2: y coordinate fish 1 (pixel)
Column 3: x coordinate fish 2 (pixel)
Column 4: y coordinate fish 2 (pixel)

Each row thus represents both fish positions in one frame of video data (29.97fps)
